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Abstract 

This paper studies the ergodic capacity of wideband multipath channels with limited feedback. Our 
work builds on recent results that have established the possibility of significant capacity gains in the 
wideband/low-SNR regime when there is perfect channel state information (CSI) at the transmitter. 
Furthermore, the perfect CSI benchmark gain can be obtained with the feedback of just one bit per 
channel coefficient. However, the input signals used in these methods are peaky, that is, they have a 
large peak-to-average power ratios. Signal peakiness is related to channel coherence and many recent 
measurement campaigns show that, in contrast to previous assumptions, wideband channels exhibit a 
sparse multipath structure that naturally leads to coherence in time and frequency. In this work, we 
first show that even an instantaneous power constraint is sufficient to achieve the benchmark gain 
when perfect CSI is available at the receiver. In the more realistic non-coherent setting, we study the 
performance of a training-based signaling scheme. We show that multipath sparsity can be leveraged to 
achieve the benchmark gain under both average as well as instantaneous power constraints as long as 
the channel coherence scales at a sufficiently fast rate with signal space dimensions. We also present 
rules of thumb on choosing signaling parameters as a function of the channel parameters so that the 
full benefits of sparsity can be realized. 

I. Introduction 

Recent research on the fundamental limits of wideband/low-SNR communications has focused 
on the non-coherent regime where the impact of channel state information (CSI) on the achievable 
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rates is critical. From a capacity perspective, spreading signals has been shown to be sub- 
optimal [1] and peaky or flash signaling schemes are necessary [2], [3] to achieve the non- 
coherent wideband capacity. Recent work by Zheng et al. [4] has emphasized the crucial role 
of channel coherence in the low-SNR regime and the importance of implicit/explicit channel 
learning schemes that can bridge the gap between the coherent and the non-coherent extremes. 
However, these results have been derived based on an implicit assumption of rich multipath 
where the number of independent degrees of freedom (DoF) in the delay domain scale linearly 
with bandwidth. 

Recent measurement campaigns in the case of ultrawideband systems show that the number 
of independent DoF do not scale linearly with bandwidth [5]— [1 1]. In fact, the physical layer 
channel model proposed by the IEEE 802.15 working group for ultrawideband communication 
systems exhibits sparsity in the delay domain (see for example, the measurement data in [12, 
p. 15]). Motivated by these works, we introduced the notion of multipath sparsity in [13] as a 
source of channel coherence and proposed a channel modeling framework to capture the impact 
of sparsity in delay and Doppler on achievable rates. The analysis in [13] shows that multipath 
sparsity can help in reducing/eliminating the need for peaky signaling in achieving wideband 
capacity. 

In this work, we build on the results in [13] and study the impact of channel state feedback 
on achievable rates in sparse wideband channels. Although earlier works (for example [14]— [16] 
and references therein) have explored capacity with transmitter CSI, it is only recently [2], [17], 
[18] that the impact of feedback in the low-SNR, non-coherent regime has received attention. In 
particular, in the low-SNR regime, it is shown in [2], [17] that with an average power constraint, 
the capacity gain with perfect transmitter and receiver CSI over the case when there is only 
perfect receiver CSI is log (s^r)- More interestingly, it is shown that a limited feedback scheme 
where only one bit per independent DoF is available at the transmitter can also achieve a gain of 
log (s^r) [2], [17]. However, for both the optimal waterfilling scheme [14], [19] as well as the 
one bit limited feedback scheme, the input signal tends to be peaky (or bursty) in time, leading 
to a high peak-to-average power ratio, and difficulties from an implementation standpoint. The 
need to reliably estimate the channel at the receiver leads to the use of peaky training followed 
by communication in [17]. Similar results have also been reported in [18] where the authors 
study the optimization of the training length, average training power and spreading bandwidth 
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in a wideband setting. 

The focus of this work is on leveraging multipath sparsity to overcome or reduce the need 
for peaky signaling schemes. We work towards this goal by providing a concise description of 
the sparse channel model [13] in Sec. EL We then study the performance in the case where the 
receiver has perfect CSI and the transmitter has one bit (per independent DoF) in Sec. III. In 
contrast to [2], [17], [18] which study the performance only under an average (or long-term) 
power constraint, we also consider an instantaneous (or short-term) power constraint. We restrict 
our attention to causal signaling schemes that can be realized in practice. We show that an optimal 
threshold of the form h t = A log (s^r) for any A G (0, 1) provides a measure of achievable rate 1 
which behaves as (1 + h t ) SNR in the wideband limit. Thus when A approaches 1, we achieve 
the perfect transmitter CSI capacity which is the benchmark for all limited feedback schemes. 
We derive a sufficient condition under which this benchmark can be approached even with an 
instantaneous power constraint. A key parameter that determines this condition is E [D e ff], the 
average number of active independent channel dimensions, the number of independent channel 
coefficients that exceed the threshold in the power allocation scheme. In particular, with an 
instantaneous power constraint, the benchmark capacity gain is achieved when E [D ef f] — h t — > oo 
as SNR — > 0. We discuss the feasibility of the above condition when the channel is rich as well 
as sparse. 

In Sec. IV, the focus is on the case where the receiver has no CSI a priori and a training- 
based signaling scheme is employed. Along the same lines as in [17], [18], we study the rates 
achievable with this scheme, albeit for sparse channels. With an average power constraint, it 
is shown that as long as the channel coherence dimension N c scales with SNR as N c = 
for some /x > 1, the rate achievable with the training scheme converges to the capacity with 
perfect transmitter CSI, the performance benchmark, in the wideband limit. Furthermore, this 
condition is achievable only when the channel is sparse and we provide guidelines on choosing 
the signal space parameters (signaling/packet duration, bandwidth and transmit power) such 
that 11 > 1 is realized. The critical role of channel sparsity is further revealed when we impose 
an instantaneous power constraint. In contrast to peaky signaling that violates the finiteness 
constraint on the peak-to-average power, channel sparsity is necessary to realize the conditions 

'All logarithms are assumed to be base e and the units for all rate quantities are assumed to be nats per channel use. 
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required to approach the performance gain with an instantaneous power constraint: /i > 1 and 
E [D e ff] — h t — > oo. We summarize the paper in Sec. V by highlighting our contributions and 
placing them in the context of [2], [17], [18]. 

II. System Model 

In this section, we elucidate the model developed in [13] for sparse multipath channels. Our 
results are based on an orthogonal short-time Fourier (STF) signaling framework [20], [21] that 
naturally relates multipath sparsity in delay-Doppler to coherence in time and frequency. 

A. Sparse Multipath Channel Modeling 

A discrete, physical multipath channel can be modeled as 

y(t) = [ m [ 2 h(T,ls)x(t-T)e j27n/t dlsdT + W (t) (1) 

h(r,u) = J2^S(r-r n )5(iy-u n ), y (t) = Y J Mt-T n )e^ t + w(t) (2) 

n n 

where h(r, u) is the delay-Doppler spreading function of the channel, (3 n , r n E [0, T m ] and 
v n E [—Wd/2, Wd/2] denote the complex path gain, delay and Doppler shift associated with the 
n-th path. T m and Wd denote the delay and the Doppler spreads, respectively. The quantities 
x(t),y(t) and w(t) denote the transmitted, received and additive white Gaussian noise waveforms, 
respectively. Throughout this paper, we assume an underspread channel where T m W d <^ 1. 

We use a virtual representation [22], [23] of the physical model in (2) that captures the 
channel characteristics in terms of resolvable paths and greatly facilitates system analysis from a 
communication-theoretic perspective. The virtual representation uniformly samples the multipath 
in delay and Doppler at a resolution commensurate with signaling bandwidth W and signaling 
duration T, respectively. Thus, we have 

L M 

y(t) = E E K m x{t-i/W)e^ mt ' T + w{t) (3) 

£=0 m=-M 

where L = \T m W~\ and M = \TWd/2~\. The sampled representation (3) is linear and is 
characterized by the virtual delay-Doppler channel coefficients {h f , m } in (4). Each h^ m consists 
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of the sum of gains of all paths whose delay and Doppler shifts lie within the (£, m)-th delay- 
Doppler resolution bin S Ti e fl S Vjm of size At x Au, At = pp,Az/ = ^ as illustrated in 
Fig. 1(a). Distinct /i^ m 's correspond to approximately disjoint subsets of paths and are hence 
approximately statistically independent. In this work, we assume that the channel coefficients 
{h e ^ m } are perfectly independent. We also assume 2 Rayleigh fading in which {h^ m } are zero- 
mean Gaussian random variables. 

Let D denote the number of non-zero channel coefficients that reflects the (dominant) statis- 
tically independent DoF in the channel and also signifies the delay-Doppler diversity afforded 
by the channel [22]. We decompose D as D = D T D W where D T denotes the Doppler/time 
diversity and D w denotes the frequency /delay diversity. The channel DoF or delay-Doppler 
diversity is bounded as 

D = D T D W < -D max — -Dt, max-D^, max (5) 
£>T,max = \TW d ] , D W!laax =\T m W] (6) 

where D Tt max denotes the maximum Doppler diversity and D w> max denotes the maximum delay 
diversity. Note that D T: max and D Wj max increase linearly with T and W, respectively, and thus 
represent a rich multipath environment in which each resolution bin in Fig. 1(a) corresponds to 
a dominant channel coefficient. 

However, there is growing experimental evidence [5]— [1 1] that the dominant channel coeffi- 
cients get sparser in delay as the bandwidth increases. Furthermore, we are also interested in 
modeling scenarios with Doppler effects, due to motion. In such cases, as we consider large 
bandwidths and/or long signaling durations, the resolution of paths in both delay and Doppler 
domains gets finer, leading to the scenario in Fig. 1(a) where the delay-Doppler resolution bins 
are sparsely populated with paths, i.e. D <C -D max . 

In this work, we model multipath sparsity by a sub-linear scaling of D T and D w with T and 
W, respectively: 

D w ~ gi (W) , D T ~g 2 (T) (7) 

2 Note that the Rayleigh fading assumption is used only for mathematical tractability. The general theme of results will continue 
to hold as long as the fading distributions have an exponential tail. See [17] for details and [13] for a discussion on modeling 
issues. 
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where g x and g 2 are arbitrary sub-linear functions. As a concrete example, we will focus on a 
power-law scaling for the rest of this paper: 

D T =(TW d ) 5 \ D w = (WT m ) h (8) 

for some 61,62 G (0, 1). But the results derived here hold true for any general sub-linear scaling 
law. Note that (6) and (7) imply that in sparse multipath, the total number of delay-Doppler 
DoF, D = D T D W , scales sub-linearly with the signal space dimension N = TW . 

Remark 1: With perfect CSI at the receiver, the parameter D denotes the delay-Doppler 
diversity afforded by the channel, whereas with no CSI, it reflects the level of channel uncertainty; 
the number of channel parameters that need to be learned at the receiver for coherent processing. 
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Fig. 1. (a) Delay-doppler sampling commensurate with signaling bandwidth and duration, (b) Time-frequency coherence 
subspaces in STF signaling. 



B. Orthogonal Short-Time Fourier Signaling 

We consider signaling using an orthonormal short-time Fourier (STF) basis [20], [21] that 
is a natural generalization 3 of orthogonal frequency-division multiplexing (OFDM) for time- 
varying channels. An orthogonal STF basis {(f)£ m (t)} for the signal space is generated from 
a fixed prototype waveform g(t) via time and frequency shifts: 4>e m (t) = g(t — £T )e j27rWot , 

3 STF signaling can be treated as OFDM signaling over a block of OFDM symbol periods with an appropriately chosen symbol 
duration. 
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where T W = 1, i = 0, • • • , N T - 1, m = 0, • • • , N w - 1 and N = N T N W = TW with 
N T —T/T , N w = W/W Q . The transmitted signal can be represented as 

N T -1 N w -1 

X ^ = E E Xtmfamit) < t < T (9) 
£=0 m=0 

where {x( m } denote the N transmitted symbols that are modulated onto the STF basis waveforms. 
The received signal is projected onto the STF basis waveforms to yield 

Vtm = (V, <t>lm) = h em/m' X t'm' + W tm- (10) 

/?' i 

We can represent the system using an iV-dimensional matrix equation [20], [21] 

y= Hx + w (11) 

where w is the additive noise vector whose entries are i.i.d. CAf(0, 1). The N x N matrix H 
consists of the channel coefficients {h £me > m >} in (10). We assume that the input symbols that 
form the transmit codeword x satisfy an average power constraint 

^•E[||x|| 2 ] <P. (12) 

Since there are N = TW symbols per codeword, we define the parameter SNR (transmit energy 
per modulated symbol) for a given average transmit power P as SNR = ^ = ^. In this work, 
the focus is on the wideband regime where SNR — > as W — > oo for a fixed P. 

For sufficiently underspread channels, the parameters T Q and W can be matched to T m and Wd 
so that the STF basis waveforms serve as approximate eigenfunctions of the channel [20], [21]; 
that is, (10) simplifies to 4 y tm xs he m xe m + we m . Thus the channel matrix H is approximately 
diagonal. In this work, we assume that H is exactly diagonal; that is, 



H = diag 



hu ■ • • hiNc i hyi • • • ^2JVc ' ' ' hpi " • • hpNc 

Subspace 1 Subspace 2 Subspace D 



(13) 



The diagonal entries of H in (13) admit an intuitive block fading interpretation in terms of 
time-frequency coherence subspaces [20] illustrated in Fig. 1(b). The signal space is partitioned 
as N = TW = N C D where D represents the number of statistically independent time-frequency 
coherence subspaces, reflecting the DoF in the channel, and N c represents the dimension of each 

4 The STF channel coefficients are different from the delay-Doppler coefficients, even though we are reusing the same symbols. 
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coherence subspace, which we refer to as the coherence dimension. In the block fading model 
in (13), the channel coefficients over the i-th coherence subspace ha, ■ ■ ■ , h iNc are assumed to 
be identical (denoted by hi), whereas the coefficients across different coherence subspaces are 
independent and identically distributed. Thus, the channel is characterized by the D distinct STF 
channel coefficients, {hi}, that are i.i.d. zero-mean Gaussian random variables (Rayleigh fading) 
with (normalized) variance equal to E[|/ij| 2 ] = X]nE[|/3 n | 2 ] = 1 [20]. 

Using the DoF scaling for sparse channels in (7), the scaling behavior for the coherence 
dimension can be computed as 

W T 
W coh = —-f^W), T coh = — ~f 2 (T) (14) 

N c = W coh T coh ~ h(W)f 2 (T) (15) 

where T coh is the coherence time and W coh is the coherence bandwidth of the channel, as 
illustrated in Fig. 1(b). As a consequence of the sub-linearity of gi and g 2 in (7), fi and f 2 are 
also sub-linear. In particular, corresponding to the power-law scaling in (8), we obtain 

7^ = -^p W ^ = ~^f (16) 
Remark 2: Note that when the channel is sparse, both N c and D increase sub-linearly with 

N, whereas when the channel is rich, D scales linearly with N, while iV c is fixed. 

In this work, the focus is on computing achievable rates in the non-coherent setting with 

feedback and as we will see in Sec. Ill and IV, the rates turn out to be a function only of 

the parameters N c and SNR. Thus, in order to analyze the low-SNR asymptotics, the following 

relation between N c and SNR (= P/W) plays a key role: 

*=sW- " >0 (17) 

where the parameter reflects the level of channel coherence. We will revisit (17) and discuss 
its achievability and implications in Sec. IV. 

III. Achievable Rates with Perfect Receiver CSI and Limited Channel State 

Feedback 

In this section, we study the scenario when there is perfect CSI at the receiver. We assume 
throughout this paper that both the transmitter and the receiver have statistical CSI - knowledge 
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of T m , Wd, gi, Q2, fi and f 2 so that the scaling in D and iV c are known. On one extreme, with 
perfect receiver CSI and no transmitter CSI (no feedback), the coherent capacity per dimension 
(in nats/s/Hz) equals 

W SNR H sup B[^MHQH1] ^ 

Q:Tr(Q) <TP ^ c U 

The optimization is over the set of N C D -dimensional positive definite input covariance matrices 
Q = E [xx ff ] satisfying the average power constraint in (12). Due to the diagonal nature of H 
in (13), the optimal Q is also diagonal. Furthermore, with no transmitter CSI, the uniform power 
allocation Q = j^1n c d = SNR • 1n c d achieves this optimum. The corresponding capacity in 
the limit of low-SNR is [2], [4] 

Ccoh,o(SNR) pa SNR — SNR 2 . (19) 

On the other extreme is the case of perfect receiver and transmitter CSI, where the receiver 
instantaneously feeds back all the channel coefficients, {hi}f =1 , corresponding to the D indepen- 
dent coherence subspaces to the transmitter. The optimum transmitter power allocation in this 
case is waterfilling [14], [19] over the different coherence subspaces. In the low-SNR extreme, 
it is shown in [2], [17] that the capacity with perfect transmitter CSI scales as log (g^p) SNR. 
That is, the capacity gain (compared with the receiver CSI only case) is directly proportional to 
the waterfilling threshold, h w ~ log (s^r)> an d this gain serves as a benchmark for all limited 
feedback schemes. More interestingly, it is shown in [2], [17] that this maximum capacity gain 
can be achieved with just one bit of feedback per channel coefficient. 

In the case of limited feedback, both the transmitter and the receiver have a priori knowledge 
of a common threshold denoted by h t . The receiver compares the channel strength (\hi\ 2 , i = 
1, 2, • • • , D) in each coherence subspace with h t , and feeds back 

| 1 if IhA 2 > h t 
h= { (20) 

I if \hi\ 2 < h t . 

At the transmitter, power allocation is uniform across the coherence subspaces for which h = 1 
and no power is allocated to those subspaces for which 6j = 0. The input power allocation is 
conditioned on the partial CSI available at the transmitter (denoted by CSI), which is {bi}f =l . 
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This power allocation, which we still denote by Q with an abuse of notation, takes the form 
Q(CSI) = diag(E[|a; 1 | 2 |CSI],E[|a; 2 | 2 |CSI],--- , E[|^| 2 |CSI]) (21) 
= diag( gi, ■ • • , qi , q 2 , ■ • • ^ , • • • , gp, • • • , go ) (22) 

% = iVx(N 2 >h t )- (23) 

The choice of P depends on the type of power constraint and also on the nature of feedback. 
To explore this further, let D eff denote the number of active subspaces, those which exceed the 
threshold h t . We have 

D 

Deff = $>(N 2 > h t ) (24) 

i=i 

E [Drff] ( = } -DE [ X (\h\ 2 > h t )] = De^ (25) 

where (a) is due to the fact that {hi}f =l are i.i.d. and (b) is due to the fact that for a standard 
Gaussian, E [x(|/i;| 2 > h t )] = Pr (|^| 2 > h t ) = e - ht . 

If we assume knowledge of {bi}f =1 at the beginning of each codeword, albeit non-causally, 
at the transmitter, then we can uniformly divide power among the active subspaces. That is 

p ™ = m7, m 

The rate achievable with this power allocation, denoted by C cohi i.LT(SNR), is 

i D r / tp \ 

C coM) LT(SNR) = max-^E log M + • \h t \ 2 j x (N 2 > h t ) . (27) 

i=i L V c e ff / 

The power allocation in (26) satisfies the power constraint instantaneously as well as on average. 
To see this, note that 

i=l i=l c eff 

and clearly E [Pi ns t,nc] = P as well. The non-causality of the scheme is more relevant in the 
scenario when the receiver estimates the channel coefficients {hi\f =1 and feeds back {bi}f =l 
based on these estimates. This motivates us to instead consider a causal power allocation scheme, 
one in which for alH = 1, • • • , D, q t in (23) depends on hi only through the indicator function 
and P is independent of {hj}f =1 . From (23), we have 

D D 

E [||x|| 2 ] = N c J2^h\ = N c J2 P o ■ E [x(N 2 > M] = N C P E [D efF ] . (29) 

i=l i=l 
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Thus to satisfy E [||x|| 2 ] < TP, the power allocation for the causal scheme is given by 



TP 



TP 



P °' c N C E [Deff] N c De-^ 
and the corresponding rate, C , co h,i,LT(SNR), is given by 



(30) 



1 ° 

C C o M ,lt(SNR) =max- Ve 

ht L) 



(3D 



The causal power allocation policy in (30) satisfies the average power constraint but can have a 
large instantaneous power. This is because 



mst,c 



1=1 C V 



D 



eff 



De- h < 



P. 



(32) 



Thus E [Pj ns t,c] — but unlike (28), Pi ns t,c £ [0, oo) depending on the choice of h t . We will 
address this issue in Sec. III-B, but first, we study the average power constraint case more 
carefully. 



A. Achievable Rates under Average Power Constraint 

The following theorem establishes that a threshold of the form h t ~ A log (s^r) f° r some 
A G (0, 1) provides the solution to (31). 

Theorem 1: Given any A G (0, 1), a causal on-off signaling scheme under an average power 
constraint achieves C L b < C C oh,i,LT(SNR) < C UB with an optimal threshold of the form: 

h t 



lim 



SNR^oAlog(s^) 



1 



where 



C UB = SNR • 
CVb = SNR A ■ 



log(l + ASNR 1 - A log(s^))+log 1 + 



SNR 



l+ASNR^Mog^) 



log(l + ASNR 1 - A log(s^)) + -log(l + 



2SNR 1 



l+ASNR 1 - A log 



Proof: Starting from (31), we have 

1 D 

C coM , LT (SNR) = max-^E 



i=l 



log 1 + 



TP 



N r De~ ht 



hi\ 2 )x(\hi\ 2 >K) 



(a) 



E[log(l + SNRe ht |/f) Xd^l 2 > h t )] 



(33) 

(34) 
(35) 



(36) 
(37) 
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where (a) follows from the fact that {hi} are i.i.d. CJ\f(0, 1) and h is a generic i.i.d. CJ\f(0, 1) 
random variable. The expectation in (37) can be computed using [24, 4.337(1), p. 574]. With 
a 4 ±«A we have 

(38) 



= e 

-oo p-t 



C coM ,lt(SNR) = e- ht - [log(l + SNRh te h t )+exp(a)£°s_!dt 

- ht - [log(l + SNRh t e ht ) +u a ] (39) 

where z/ Q = exp (a) \-di. As a — > oo, the following bounds hold for z/ a [25, 5.1.20, p. 
229]: 

ilog (l + ^j<u a < log (i + ^) • ( 4 °) 

It can be checked that the choice of h t maximizing (39) is obtained by setting its derivative to 
zero and satisfies 

A = 1 - log (l + SNRh t e ht ) - -u a = 0. (41) 

Now, if h t is such that lim h / , x = 1 for some A e (0, 1), then as SNR — > 0, we have 

SNR->0 ^log(sM) 

SNRh t e ht — > and a — > oo. Thus using (40), we can approximate v a as z/ Q ps -. With this 
approximation in (41), we have gN p eht • z/ Q i + sNRh t e h t — * ^ Using the choice of h t as in (33), 
it follows that as SNR — > 0, A — > 0. Substituting this choice of h t in (39) and using the upper 
and lower bounds on u a in (40), we obtain the bounds in (34) and (35). ■ 
It can also be shown that the rate achievable with the causal scheme is asymptotically (in 
low-SNR) the same as the non-causal capacity in (27). That is, C co h,i,LT(SNR) is a tight bound 
to C coMi lt(SNR) and for all A G (0, 1), we have 

|C coMiL T(SNR)-a cohil)LT (SNR)| 

sfco C coM , LT (SNR) = °" (42) 

The proof of the above statement can be found in Appendix A. 

Corollary 1: The capacity gain for the D-bit channel state feedback, causal power allocation 
scheme over the capacity with only receiver CSI in (19) is 

lim dc ° hA ' L ]ill R) = (1 + ht) = 1 + A log fJ-V (43) 

snr-o c cohi0 (SNR) 1 u 6 Vsnr; 

Proof: A Taylor series expansion of the upper and lower bounds in (34) and (35) shows 
that they are equal up to first-order. This common term is such that 

C coMiLT (SNR) = SNR + A log (^^j ) = (1 + h t )SNR. (44) 
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C fSNR) 

On the other hand, with CSI at the receiver alone, we have from (19), coh s ° NR — - = (1 + o(l)). 
Thus the desired result follows. ■ 

Remark 3: The capacity gain due to feedback is directly proportional to h t and the highest 
gain is obtained by choosing A — > 1, and equals the benchmark where perfect CSI is available at 
both the ends [17]. Statements analogous to those in Theorem 1 and Corollary 1 are well-known 
from prior work; see [2], [17], [18] for details. 

We now revert our attention back to the instantaneous transmit power case described in (32). 
Note that as D — > oo, Pi ns t,c — ► P as a consequence of the law of large numbers. However, 
for any finite D, P mst ^ may be much larger than P. This is a serious issue in practical systems 
that typically operate with peak power limitations. Thus it is important to analyze the impact of 
constraints on the instantaneous power in (32), as discussed next. 

B. Achievable Rates under Instantaneous Power Constraint 

In addition to the average power constraint, let us impose a constraint on the instantaneous 
transmit power of the form 

Pinst,c °< AP (45) 

where A > 1 is finite. With this short-term constraint, we now compute the rate, C co h i i i sT(SNR), 
achievable with the causal signaling scheme. We are particularly interested in exploring condi- 
tions under which C co h,i, st(SNR) C , co h,i,LT(SNR). To this end, we employ the following power 
allocation 

Q = diag( gi, ■ • • , gi , p, ■ • • , g2 , • • • , go, • • • , qp ) (46) 

N c N c N c 

q l = Po,cX(\h t \ 2 >K)x(^tx(\h 3 \ 2 >K)<ADe-^. (47) 

The second indicator function in (47) checks for the constraint in (45) causally, during each 
time-frequency coherence slot, and allocates power only if this constraint is met. Note that the 
choice of qi in (47) meets the average power constraint with an inequality and hence, q { can 
be enhanced further. On the other hand, the right-hand side of the argument within the second 
indicator function has to be reduced by the factor -p- where T,- L corresponds to the time duration 
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over which the % coherence subspaces under consideration are encountered. We will not bother 
with these secondary issues in the ensuing analysis. We then have 

<5 coh , 1>s -r(SNR) 



X>8 (l + ^l^| 2 X( '^i t ht) X (±XW > hO < ADe* 



log (1 + SNR • e ht • > ht)) X E Xd^l 2 > h t) < AD ^ 



0=1 



1 D 

D ^ 

i=i 

= \) E Pr ( E ^i 2 ^ m ^ ADe ~ ht ) • E N (! + snr • eht • n 2 x(n 2 > h t ))] 
1=1 \j=i j 

Ef=iPr (E;=i^l^| 2 ^ht)<ADe- h t ) 



^Ellogfl + SNR-e^.l/il^W^ht))] 



C coh)li L T (SNR) • 



D 



where C cohi i iL T(SNR) is the rate achievable with only an average power constraint, and (a) follows 
from the fact that {hi} are i.i.d. and 



p t ±¥r[Y,x{\h \ 2 >K)<ADe-^ 
0=1 



(48) 



Thus, characterizing C , co h,i,ST(SNR) is equivalent to computing p^. In particular, under what 
condition does — > 1? This is discussed in the following proposition. 
Proposition 1: With h t ~ A log (s^r) as in (33), we have P% > ^ where 



T ~ 1 4 £>(l-A/2) 

SNR^l+SNRV^" 3 " (l+SNR A /4) l) 



if 1 < A < 2, and if A > 2, we have 

L « 1 - 



SNR A (1 + SNR A /4) D(A ir 
In particular, if 

E [D eff ] - h t = De- hl - h t ~ DSNR A + Alog(SNR) -> 00 as SNR -> 0, 
then L -> 1 for all A > 1 and C coM , S t(SNR) -> C coM , L t(SNR) . 
Proof: See Appendix B. 



(49) 



(50) 



(51) 
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C. Discussion: Rich vs. Sparse Multipath 

The result of Theorem 1 implies that the rate achievable with the D-b\t channel state feedback 
scheme approaches the benchmark, the perfect transmitter CSI capacity when A — > 1. Further- 
more, this benchmark can be attained in the wideband limit, even when there is an instantaneous 
power constraint. As described in Prop. 1, E [D eff ] — h t — > oo provides a sufficient condition. 
We now discuss the feasibility of satisfying these conditions when the channel is rich and when 
it is sparse. The behavior of E [D eff ] provides key insights in this regard. 
Al) Rich multipath: For a rich channel, from (6) we note that D scales linearly with T and W. 
For a fixed T, D ~ SNFT 1 (since SNR = £). That is, E [D eff ] - h t = £>SNR A + A log(SNR) -> oo 
for < A < 1. We can thus conclude that for rich multipath the perfect CSI benchmark is attained 
trivially with both average and instantaneous power constraints. 

A2) Sparse multipath: From the power-law scaling in (8), ignoring the constant factors, we 
have D ~ T 5l W 52 and therefore 

E [D eff ] - h t ~ T <5l SNR A_52 + A log(SNR). (52) 

For a fixed T, as SNR — > 0, we have 

{oo if < A < 5 2 
(53) 
-oo 1 > if A > 8 2 . 

While we can approach the benchmark capacity with an average power constraint, (53) suggests 
a cap on A, the highest achievable gain with an instantaneous power constraint. 

D. Capacity Optimal Packet Configurations 

From (53), we see that the perfect CSI gain is not always achievable when there is an 
instantaneous power constraint. However, we note that (53) is derived assuming a fixed choice 
of T, while we know that sparsity in Doppler facilitates any desired scaling in the DoF with 
increasing T. Leveraging both delay and Doppler sparsities, we propose the following solution 
to get around the restriction in A2. Instead of signaling with a fixed duration T, let us suppose 
that we maintain a scaling relationship for T as a function of W. For example, let T ~ W p for 
some p > 0. Consequently, D ~ T Sl W &2 ~ W &2+pSl and we have 

E [D eff ] - h t ~ SNR A - 52 -" 51 + A log(SNR). (54) 
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Thus in the limit as SNR — > 0, the asymptotic behavior of E [D eff ] — h t is given by 




(55) 



(56) 



i 



which consequently leads to the desired result that E [D eff ] — h t — > oo for all A G (0, 1). Thus 
the benchmark gain is achievable even under an instantaneous power constraint. 

To further illustrate this idea, we present an example when channel sparsity follows the power- 
law scaling in (8). For simplicity, let us assume that 5i = 5 2 = 5. From (56), we require T ~ W p 



Fig. 2 illustrates the optimal packet configuration relationship for a rich multipath channel (5 — > 
1), for a medium sparse channel (5 = 0.5) and for a very sparse channel (5 — > 0). They show that 
in sparse multipath channels, the perfect CSI capacity gain is achievable with limited feedback 
under both average and instantaneous constraints on the transmission power by appropriate 
signaling strategies. These guidelines can be easily extended to generic sub-linear scaling laws. 

IV. Achievable Rates with Channel Estimation at the Receiver 

In contrast to the perfect receiver CSI case, we now consider the more realistic case where 
no CSI is available a priori. We first consider only an average power constraint and show that 
the first-order term of the benchmark capacity can be achieved if the channel is sparse and the 
channel coherence dimension, iV c , scales with SNR at an appropriate rate, allowing the receiver 
to learn the channel reliably. We also show that this is infeasible when the channel is rich, due 
to poor channel estimation. 

More specifically, the focus is here on a training-based signaling scheme where the trans- 
mitted signals include training symbols to enable channel estimation and coherent detection. 
The restriction to training schemes is motivated by their easy realizability. The total energy 
available for training and communication is PT, of which a fraction rj is used for training and 



with p > ^ to achieve the benchmark performance. With N = TW, the capacity optimal 
(T, W) packet configuration is then given by 

T~iV^, W~N^r-p. (57) 
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Fig. 2. Optimal packet configurations with perfect receiver CSI and limited feedback as a function of richness of the channel. 
Three cases are illustrated here: Rich multipath (5 — » 1), medium sparsity (S = 0.5) and very high sparsity (d — » 0). 



the remaining fraction (1 — 77) is used in communication. With the block fading model, this 
means that one signal space dimension in each coherence subspace is used for training and 
the remaining (iV c — 1) are used in communication. This is pictorially illustrated in Fig. 3. We 
consider minimum mean-squared error (MMSE) channel estimation and the reader is referred 
to [13, Sec. lie] for more details on the training scheme. 

A. Achievable Rates under Average Power Constraint 

Let C' tra i ni i ) LT(SNR) denote the average mutual information achievable (per-dimension) with 
the causal training scheme under the average power constraint. We proceed along the same lines 
as the no feedback case [13, Lemma 1] to characterize C tra \ n: i,lt(SNR). Let H be the actual 
channel, H be the estimated channel and A = H — H denote the estimation error matrix. We 
begin with the following well-known lower-bound [26] to C tra j ni i )L T(SNR): 



E 

<5 tra i M)L -r(SNR) > sup 



logdet (l {Nc -i)D + HQH" (I + Sa> 



,-1 



Q 



N r D 



(58) 
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Fig. 3. Training-based signaling scheme in the STF domain. The D estimated channel coefficients determine the D feedback 
bits for the communication scheme with limited feedback. 



where the supremum is over {Q : Tr(Q) < (1 — rj)TP}. The optimal Q is again diagonal and 
analogous to (23), equals 



Q = diag( gi, ■ • • , gi , p, ■ • • , q 2y , • • • , gr>, • • • , qp ) 

JVc-l JVc-i 

{1- V )TP x(\hi\ 2 > 



N c -1 



2 -> ^ train 



(N C -1)D E 



(59) 



(60) 



x (Pi 2 > hr in )" 

where h^ 31 " is the threshold in the training case. The following theorem describes conditions 
under which the rates achievable with the training scheme converge to those in the coherent 
case. 

Theorem 2: If iV c = for some ji > 1, then 



lim cw,lt(snr) _ i 

SNR ^° C coM , LT (SNR) 



(61) 
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Proof: Using the choice of Q from (60) in (58) and proceeding along the lines of (48), 
we obtain 

(1 - 77) (1 + 77iV c SNR)h^ ain SNFr 



arain,l,L T (hr in ^,iV c ,SNR) = 



log 1 + 



7])SNR + K X K 2 



Z/ (l-»7)(l+J7iV c SNR)ht rain SNR+(l-r l )SNR+K 1 K2 
ri(l-ri)N c SNR 2 



(62) 



Ki — e 



if v, snr , H ., = ;/( .y,. - 1)SNR +(!-_) (63) 



where v. is as defined following (39). The tightest lower bound to (62) is obtained by maximizing 
Ctrain,i,LT (h* ram , i], N c , SN R) over rj, the fraction of energy spent on training, and over h* ram : 



a 



train, 1,LT 



max 

h train 



maxa traiM (hr in ,r/,iV c ,SNR) 



(64) 



Performing the optimization in (64) seems difficult. Motivated by our study in Sec. Ill, we now 
assume a specific form for the threshold: h* ram = elog (j^r)- It is shown in Appendix C that 
with this choice of h* ram , the optimal choice for i] and iV c can be obtained in closed form and 
the desired result in (61) is established. 

Alternatively, we demonstrate a sub-optimal, but simpler approach that suffices to obtain (61). 
This approach uses the choice of rj that optimizes the average mutual information in the no 
feedback case [13, Lemma 2]. This choice, denoted by rj*, is given as 

N c SNR+N c -l 



V 



i , jV e SNR(iV e -2) _ , 
1 N C SNR+N C -1 1 



(65) 



Let h£ rain '* 
define, 



(7V C -2)7V C SNR 

l +^SNR ^ Where h t ~ A1 °s(s^r)' K l = Kllrr,^* ^ ^ = ^ " If WC 



A 1 = 
A 2 = 



iVcSNR) h^ 3 " 1 ' *SNR 
(1-»;*)SNR+k*4 ' 

(1-rfXl+r;* iV c SNR) h^ 1 "' *SNR+(l-??*)SNR+«iK* 
rf (1-rf )JV C SNR 2 



it is cumbersome, but straightforward to show that 



lim Ai — and lim — = 

SNR^O SNR^O A 2 



(66) 
(67) 

(68) 
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for any /j, > 0. From (62), we then have 



maxa ra in,i,LT(hr m ,^Ar c ,SNR) > C train , liLT (h; rain '*, rf , N c , SNR) 



K 1 -[\0g(l+A 1 )+U A2 



(a) 

> Ki 



(b) 



1/2 
log (1 + 40 + -log (1 + — 



(69) 
(70) 
(71) 

(72) 



where (a) follows from (40) and (b) is the low-SNR approximation to (71). Substituting for 
l^tram,* an( j s j m pijfy m g we can re duce the lower bound in (72) to 

N r \ ( r]*N c SNR 



C train ,i,L T (SNR) >(l-?7* 



[l + h t ]SNR. 



(73) 



N c -lJ \l + r t *N c SMR / 
Substituting for rj* from (65) and N c = g— it can be checked that when /i > 1 the leading 
term is [1 + h t ] SNR which equals the first-order term of the coherent capacity as described by 
Corollary 1. On the other hand when /i < 1, the leading term takes the form O (SNR^) and 
hence, fj, > 1 is necessary. ■ 
Having established the result with an average power constraint, let us consider the instantaneous 
power constraint case. 



B. Achievable Rates under Instantaneous Power Constraint 

We impose a constraint as in (45) for the communication phase of the training scheme. With 
the same power allocation scheme as in (47) (Sec. III-B), we obtain 



train, 1, ST 



(SNR) 



1 \ 1 " 



i=l 



log 1 + 



IhiWjl + E tr ) 
l + qi + E t 



x 



tr 



X 



h* ra ' n (l+r,iV e SNR) 

ADe »?iVcSNR 



j'=i 



= a 



train, 1,LT 



(SNR) 



D train 



D 



where E tr = r]N c SNR and pf ain = Pr £ x(\hj\ 2 > h* rain ) < 



J =1 



h^ ain (l+r,iV e SNR) 
ADe yN c SNR 

(i-v) 



(74) 



(75) 



. Understand- 



ing when X D — — ^ 1 is similar to the case studied in Sec. III-B. Taking recourse to the 
analysis of Prop. 1 by using a threshold of the form h* ram '* = 1 ^^ r S g^ R h t where rj* is as in (65) 
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/ 1 \ -strain 

and h t ~ A log (5^), it can be shown that the ^ — is lower bounded by the same expression 
as in (49) and (50) with A replaced by j^t. After some simplifications, we can conclude that 
if t=^ - h t -> oo, then C7 t rain,i,ST(SNR) _> C tra i n) i iLT (SNR). Note that the condition in the 
perfect CSI case is more stringent than in the training setting. That is, if the channel is such that 
E [D e ff] — h t — > oo, then it automatically ensures that M2sg] _ h t — > oo. 

C. Discussion 

The analysis in Sec. IV-A and IV-B reveals that the following conditions are critical: 
CI) The channel coherence dimension, N c , scales with SNR according to N c ~ c^rm, i 1 > 1> 
and 

C2) The independent degrees of freedom (DoF), D, in the channel scales with SNR such that 
_ h t = f4 - h t -> oo as SNR -> 0. 

1— 77* L 1—77* L 

With only an average power constraint, CI is necessary and sufficient so that C tra in,i,LT(SNR) — > 
C'coh,i,LT(SNR). In particular, with A — > 1, we approach the perfect CSI benchmark. When there 
is an instantaneous power constraint, we need to satisfy both CI and C2 so that the benchmark 
can be attained. 

We now study the implications of these conditions. Note that CI predicates a certain minimum 
channel coherence level to ensure the fidelity of the training performance. That is, the larger the 
value of fi and hence, N c , the more easier it is to meet the benchmark. On the other hand, C2 
describes the required growth rate in the DoF, D, so that E [D efF ] — h t — > oo and the instantaneous 
power constraint is satisfied without any rate loss. That is, the larger the value of D, the more 
easier it is to meet the benchmark. It is clear that the two conditions are somewhat conflicting 
in nature since for a richer channel, it is easier to increase D but more difficult to increase 
N c , while for a sparser channel, it is the reverse. Therefore a natural question is if they can be 
satisfied simultaneously. 

To understand this, we first study the achievability of CI. What are the conditions on the 
channel parameters (T m , Wd, 5i and 5 2 ) and how do they interact with the signal space parameters 
(T, W and P) so that pi > 1 is feasible? As we discuss next, by leveraging delay and Doppler 
sparsities and using peaky signaling (when necessary), /i > 1 is achievable. 
Bl) Rich multipath: When the channel is rich in both delay and Doppler, N c = T ^ is fixed 
and does not scale with SNR. Thus we can never maintain the scaling relationship in N c as in 
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Theorem 2 and CI can never be satisfied. Therefore, we cannot attain the benchmark even under 
an average power constraint. 

B2) Doppler sparsity only: In this case W co h = ^- is fixed and the scaling in N c is only 
through T coh ~ / 2 (T) (see (15)). Therefore, by scaling T with W according to T ~ f' 1 (W) 
and choosing /j, > 1, we have iV c ~ T coh ~ f 2 (/sT^W 1 )) ~ sM^- ^ or me P ower_ l aw scaling 
in (16), we obtain 

T-W^. (76) 

Note that as 5± increases and the channel gets more richer, T increases monotonically in (76). 
B3) Delay sparsity only: In this case, T coh = ^- and N c = W coh T coh scales with SNR 
only through W co h ~ f\ (5^). Therefore, for any sub-linear function we cannot satisfy 

li > 1. A possible solution to overcome this difficulty is to use peaky signaling where training 
and communication are performed only on a subset of the D coherence subspaces. Modeling 
peakiness as in [4], [13] and defining ( = SNR 7 , 7 > as the fraction of D over which signaling 
is performed, it can be shown that [13, Lemma 3] the condition for asymptotic coherence gets 
relaxed to N c = SNR M peaky from the original N c = where /i pcaky = fi + 7. We require 

A*peak y > 1 which is the same as fi > 1 — 7. For the power-law scaling in (16), we have 
N c ~ fi(W) ~ W 1 ^ 52 ~ SNR i-a 2 • Thus, if the peakiness coefficient 7 satisfies 7 > 5 2 , we can 
satisfy the desired condition. 

B4) Delay and Doppler sparsity: Using (15), we have W co h ~ fi{W) and T coh ~ /2CO. 
Therefore, if we scale T with W according to 

T~/ 3 (W0 with f 3 (x) = f, 1 (j^j , (7V) 

we have N c = W coh T coh ~ h(W)f 2 (f 3 (W)) = h(W)f 2 [f^ (^y)) ~ Thus with 

fi > 1 in (77), we attain the desired scaling of iV c with SNR. For the power-law scaling in (16), 
the desired scaling in iV c can be obtained by choosing T, W and P according to the following 
canonical relationship that is obtained using (16) in (77) 

T = K m d ' _/ . (78) 

From the above discussion, it is clear that channel sparsity is necessary and in addition we also 
require a specific scaling relationship between T and W as defined in (78). But this is necessary 
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for achieving the benchmark capacity with an average power constraint (satisfying CI). We now 
study how this scaling law impacts the scaling of D with SNR, as in the instantaneous power 
case. This is critical in determining the achievability of C2, which we discuss next. We recall 
that by definition 

TW 

D = — — = TW SNR'". (79) 
Using (78) in (79) and simplifying, we obtain the induced scaling behavior on D with SNR as 

a l( 1 -M)-<i2 

D ~ SNR *=*i . (80) 

Therefore, we have E [D eff ] — h t = SNR ~ + Alog(SNR) and consequently 

oo if o < A < ^T 1 ^ 
E[D eff ]-h t ^ { '- 5l (81) 

-oo if 1 > A > Sa+ ^ )Sl . 

h+ ^- l)h >l^»> 1 -^ (82) 
1 - di ' 0i 

which yields E [D e ff] — h t — > oo for all A G (0, 1), and C2 is satisfied as desired. The special 

cases of delay sparsity only and Doppler sparsity only (as in B2 and B3) are simple extensions 

and follow naturally. 



It is easily seen that 



To summarize, 



// > 1 =^ CI is achievable (83) 
— ( 



fi > ^—^ =>- C2 is achievable. (84) 



1 - 5 2 

jj, > max I 1, — - — ] =^ CI and C2 are achievable. (85) 



Therefore, 

/ 1 

Si 

We now elucidate the optimal packet configurations for different levels of channel sparsity. 

Analogous to the discussion in Sec. III-D, we focus on the power-law scaling and illustrate 

rules of thumb for choosing T and W for a given iV = TW. Assuming symmetrical sparsity 

(Si = 5 2 = 6), we note the following two cases: 

1 S 1 S 

Case 1: — — > 1 5 < 0.5, T ~ W p , p> — — (86) 

o o 

i r r 

Case 2: < 1 « 5 > 0.5, T ~ p > -. (87) 

o l — o 
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The corresponding packet configurations are shown in Fig. 4 for 5 — > 0, 5 = 0.5 and 5 — > 1. 
It is observed that the slowest scaling in T with is obtained for 5 = 0.5 when the DoF 
follow a square-root scaling law with signal space dimension. On either extreme of this square- 
root law, the required scaling in T with W only gets worse. This conclusion is expected and 
is consistent with the contradictory requirements presented by CI and C2. When 5 < 0.5, the 
channel conditions are more favorable towards scaling iV c as a function of SNR (specified by 
CI). However, the required scaling of D with SNR (specified by C2) is non-trivial and ultimately 
dominates the required scaling of T with W. On the other hand, when 5 > 0.5, the relatively less 
sparse channel conditions are favorably disposed towards the scaling of D as a function of SNR, 
but this is at the cost of scaling in N c . For the case of asymmetrically sparse channels, it can be 
shown that this desirable condition (slowest scaling of T with W) generalizes to 8i + S 2 — 1. 

R . , Medium Sparse 




Fig. 4. Optimal packet configurations in the non-coherent scenario with limited feedback. Three cases illustrated here are rich 
multipath (S — > 1), medium sparsity (S = 0.5) and very high sparsity (S — > 0). 

V. Concluding Remarks 

In this paper, we studied the achievable rates of sparse multipath channels with limited 
feedback. The focus of our analysis is in the wideband/low-SNR regime. Our investigation 
includes constraining both the average and the instantaneous transmit powers. We first analyzed 
the case when the receiver has perfect CSI and when one bit (per channel coefficient) of this CSI 
is known perfectly at the transmitter. We established conditions under which the rates achievable 
with this scheme approach the capacity with perfect receiver and transmitter CSI. For sparse 
channels, these conditions translate to certain optimal packet configurations for signaling. When 
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the receiver has no CSI a priori, we studied the performance of a training scheme. It is shown 
that with only an average power constraint, channel sparsity is necessary to attain the coherent 
performance. With an instataneous power constraint, we established conditions on optimal packet 
configurations in order to approach the benchmark capacity gain asymptotically as SNR — > 0. 



TABLE I 

Conditions necessary to achieve the perfect CSI benchmark of log (^) SNR. 



CSI 


CSI 


Power 




anal in o 

Olgllollllg 


Rx. 


Tx. 


Const. 


Conditions 


Parameters 


Perf. 


Perf. 




h ™ ~ lo S (sm) 


Waterfilling; see [2], [17] 


Perf. 


1 bit 


Avg. 


h ' = A1 °g(sM)' 


No constraints on richness or T, W; 
see [2], [17], [18] 


Perf. 


1 bit 


Inst. 


h t = A1 °s(sm) 
for A < 1, and 
E [D eff ] - h t oo 


Rich channel: no constraint on T or W , 
Sparse (T fixed): A < 82 limits rates, 
Sparse (general): T ~ W, p> i=^- 


Train. 


1 bit 


Avg. 


N c ~ sm^, ii > l 


Rich channel: Impossible, 
Sparsity (Doppler): Non-peaky 
scheme with T ~ W 1-51 , 
Sparsity (delay): Peaky scheme with 

peakiness coefficient 7 > #2, 
Sparsity (both): Non-peaky scheme; 
see (77) and (78) 


Train. 


1 bit 


Inst. 


N <= ~ sM?' V > 1 
and - h t -> oo 


Rich channel: Impossible, 

Sparse (both): /j, > i^ 2 - for no rate 

loss, else A < ^+(^T 1 )^ 
1—01 



We contrast the results of this work with recent observations in [17], [18]. The focus in [17], 
[18] is on training schemes and on scenarios where T coh increases as SNR decreases, although 
there is no mention of how such a scaling law can be realized in practice. In particular, the authors 
show that capacity scales as log (T coh ) SNR if log(T co ^) -< log (sj^r) an d equals the coherent 
capacity, log (5^) SNR, when \og(T coh ) >z log (s^r)- On the other hand, we have shown that 
when the channel is sparse, channel coherence scales naturally with T and W and the benchmark 
gain, log (sjqp), can always be achieved by appropriately choosing T and W. Furthermore, while 
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[17], [18] considered only an average power constraint, we have established achievability under 
both average and instantaneous power constraints. Also, peaky training schemes are necessary in 
the framework of [17] to achieve perfect training performance. Such schemes would violate any 
finite instantaneous power constraint. Our findings here reveal that channel sparsity is a degree 
of freedom that can be exploited to obtain near-coherent performance with non-peaky training 
schemes. Table I provides a short summary of our contributions and places them in the context 
of [2], [17], [18]. 

Finally, we note that the results obtained here closely parallel our earlier work [13] where we 
studied the achievable rates with training and no feedback. We showed that when N c = with 
pi > 1, the channel is asymptotically coherent; channel estimation performance is near-perfect at 
a vanishing energy cost. Analogous to [13], we have shown here that under the assumption of 
an error-free /J-bit feedback link, the rate achievable with the training scheme converges to the 
perfect CSI benchmark. Furthermore, the cost of feedback, measured in terms of the number of 
feedback bits per dimension (D/N) converges asymptotically to zero in a sparse channel. 



Appendix 

A. Tightness 0/C coM)L t(SNR) to CcoM,lt(SNR) as SNR -> 

Let Xi denote the random variable x(l^i| 2 > M- Defining 7 
have 



A k7coM,LT(SNR)-C roh ,i,LT(SNR) 



C coM ,lt(SNR) 



7 



< 
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< 
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N r D 2 e~ h 
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(88) 



(89) 



(90) 



(91) 



(92) 
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where (a) follows from the log-inequality and (b) from the fact that {hi} are i.i.d. Conditioning 
on xi, we now have 



TP 



7o = 



N c De- h 



-E[xi]E^ li{x . J>1} 



M 



De-^-(l + ^ j>lXj ) 



SNR • B hu{x . jj>1} 



\hi\ 



De -w _ (1 + E Xj) 



(l + E;>iXi) (i + 



= SNR • E hl 



1 , TP|fei| 2 
x iV c De" h t 



• E {xj <j>iy 



N c De~ h t 

De~ ht - (1 + Ej>i Xj) 



< SNR.E[|/n| 2 ].E toj>1} 



(1 + E^iX,) 
L>e- h '-(l + £. >lX ,0 



A 

= 7i 



(93) 



(94) 



(95) 



(96) 



where (a) follows from the fact that hi and {%_,-, j > 1} are independent. 

To show the closeness of C C oh,i,LT(SNR) to C co h,i,i_T(SNR), we now produce an upper bound 
for 71 that tends to as SNR — > 0. Our goal is to show that given any choice of D, is 
bounded. Consider 



E 



{xjj>n 



De~ ht - (1 + E,>i Xj) 



(1 + E^iX,) 



E 



{x j j>i} 



; i + E j >iXj) 



- 1 



(a) 
< 



A3 






^(l + E^iX^J 


(i + E,>iX,) 



=72 



where (a) is a consequence of Cauchy-Schwarz inequality. Let E denote e ht . We then have 



(6) 
72 < 



\ 



1 + D 2 E 2 ■ E 



Xj 



A + Ej>i Xj 



2DE 



l + (/J-l)E 



(97) 



where in (b) we have used the fact that E [^] > for a positive random variable X. We 



now estimate a = E 



(i+E,>iXj) 2 



It is easy to check that 



a 



D-l 

E 



D-l\ E*(l- E) 



D-l- 



(98) 
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Noting that 



D-l , D _ x 



i=0 

and integrating twice both sides of (99) with respect to y, we have 



g)W = £i(D-i\ (100) 

I J r,- 4-1 4- ov v ' 



^P + l) V < J (< + l)(< + 2)- 

Using y = in (100), we have 



D-l 



y ,C-l\ E'(l-Er- 



Observe that ( i+ \)2 < (i+i) 2 (i+2) ^ or a ^ * — ^ anc ^ an u PP er bound for 72 is 




2D 2 E 2 2DE D 2 E - WE + 3D - E + 1 



(102) 



£>(£> + 1)E 2 1 + (D-1)E y (D + 1)(DE-E + 1) 
which is bounded for any choice of D. (In fact, the upper bound converges to 1 as D — > 00). 
Note that the bound in (102) is loose and one might expect that — > as £) — > 00 as a 
consequence of the law of large numbers. However, for our purpose, the proposed loose upper 
bound in (102) is sufficient. 

B. Proof of Proposition 1 

To compute p { = Pr (zj=i *(N 2 > h t) < ADe _ht ), we need the following result [27, 
Theorem 2.8, p. 57] on the tail probability of a sum of independent random variables. 

Lemma 1: Let Xj,i = 1, • • • ,n be independent random variables with E[Xj] = and 
E[Xf] = of. Define £? n = ^™ =1 °f • ^ there exists a positive constant H such that 

E[X™] < \m\a]H m ' 2 (103) 

for all i and x > ^ , then we have Pr( ^"=1 Xj > x) < exp ( — -f^) . If x < then we have 
Pr(Er=iX i >x)<exp(-^-). ■ 
To apply Lemma 1, we set n = i and Xj = x(\hj\ 2 > h t ) — E [x(|^j| 2 > M] = x(l^j| 2 > h t ) — 
e ht — Xj ~~ E for j = 1, • • • , i. Then, a simple computation of the higher moments of X^ implies 
that E[X|] = a) = E(l - E), B t = iE(l - E), E[X™] = E(l - E) • ((1 - E) m_1 + (-l) m E m ~ 1 ). 
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It can be checked that H — (1 — E) is sufficient to satisfy the conditions of Lemma 1. With this 
setting, we have 



Pr [J2x(\h J \ 2 >h t )-tE>(AD-t)E) < 




(AD~i)E 
' 4(1-E) 

(AD-ifE 
4i(l-E) 



if i < L^J , 



(104) 



ED 
i=lPi 



If 1 < A < 2, with k = 4(!_ E ) using (104), the following lower bound, L, holds for ^g 1 



L = 1 



(a) x 



> 1 



-AD 



-k(AD-I) . ( e «L^J _ 1) 



(AD-z) 2 k 





AD 


(»- 


2 



-(A-1) 2 Dk 



_i_ . e-K^- 1 ) + (1 + £)(i _ A/2)) e -^ 2DK 



(105) 

(106) 
(107) 



where (a) follows by first using ^ AD . - > (A — 1) 2 D for all 1 < i < D and then upon further 
simplification using the sum of a geometric series. 

If A > 2, we have the following lower bound to Pl : 



L — 1 — exp(-ADn) e ™ ~ 1 - e- ,s(D(A - 1) - 1) • ^r^- 



(108) 



KKD 



SNR 



With h t = A log (5^) as in (33), the dominant term of E is SNR A and hence in k is 
With this choice of h t in (107) and (108) and simplifying, we obtain the desired bounds in (49) 
and (50). It is also straightforward to check that when D satisfies L>SNR A + Alog(SNR) — > 00 
as SNR — > 0, L — > 1 in both the cases. ■ 



C. Completing the Proof of Theorem 2 

The choice of h t we study is h t = elog (5^) ft> r some e > 0. First, with this fixed choice 
of h t , note that maximizing C tra in,i,LT (Vi N c , SNR) is equivalent to setting its derivative (with 
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respect to rf) to zero. Then, it is straightforward to check that the derivative is 
vpK , h t / (1 - v )(l + ??iV c SNR)h t SNR 

'"be I ' 



7] V e \ (1 — ^)SNR + K1K2 



I II 



+ 



{"> ~ i) 

SNR V 



Kl 1 1 " iVe J l O^F + ^ C SNR(1-,)J - SNR(ht + 1} 



111 



h t SNR 2 7V c?7 AT C SNR 2 (1 - n f - ^ 2 (1 + „SNRJV C ) (l + ±^L) 

+ — ■ . (109) 

(1 - ?7)SNR + re^ (l-r ? )SNR + /t 1 /t 2 + (l-r7)(l + r7iV c SNR)h t SNR V 



IV 



For simplicity, we will denote the four terms in (109) by I, II, III and IV. We will further assume 
that r] = SNR X , x > and iV c = y > 0. For a given choice of e, our goal is to determine 
the relationship between x and y such that the derivative in (109) can be zero. We consider three 
cases: i) y > 1 + x, ii) y < 1 + x and iii) y — 1 + x. 

Case i: First, note that r]A^ c SNR = SNR~ Z for some z > 0. The dominant terms of (3 can be seen 
to be gNF |i_ e + elog (§^r) and thus, up to first order f3 = SN pi- e - Similarly, (1 - ??)SNR + kik 2 
up to first order equals SNR e ~ z . Note from [25, 5.1.20, p. 229] that v p = O (^j if (5 -> 00 
and hence I is elog (5^) SNR£ 1 +a; -i . It can also be checked that II is (elog (g^)) 2 SNR «+*-i . 
up — ^ = O \ an d hence III is elog (5^) SNRe +^-i as long as y < 1 + 2x. Under the same 
assumption, y < 1 + 2x, IV is — (elog (5^)) 2 SNRe 1 +j; -i . Thus, by playing with constants the 
derivative can be set to zero in this case. If y > 1 + 2x, I and II remain unchanged, but III is 
SNR 2+x- ?/ - e and | V is _ e i og SNR 2+a; ~^ e . By comparing the coefficients, we see that the 
only way the derivative can be zero is if y — 1 + 2x. 

Case ii: In this case, the first order terms show the following behavior. With w = 1+x — y > 0, 

I is SUR W ~ X , II is elog log log (^) III is -SNR 2 ^ * and IV is SNR 2 " 2 ^. 

It can be seen that the derivative can never be zero and hence this case is ruled out. 

Case iii: In this case, based on a similar analysis, we see that the derivative can again be set to 

zero. 

Therefore, if e G (0, 1), x > and 1 + x < y < 1 + 2x, we have 

~ / elog(^)SNR 1 - £ (l-SNR x )\ 

CW,lt(SNR) > SNR^log 1 + {SNR \_ SN ^ y " + SNR. (110) 
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Thus, C trainjliLT (SNR) is up to first order the same as C cohl)LT (SNR) and C coh 1LT (SNR). If 
y = 1 + x and r]A^ c SNR = a for some choice of a (positive, finite and independent of SNR), we 
need a > ^ and we have 

<? train)1 , LT (SNR) > SNR^ log + eSNR 1 -^ log (gj^)) + ^ ' SNR. (Ill) 

If y < 1 + x, the training scheme is strictly sub-optimal (in the limit of SNR) from an ergodic 
capacity point-of-view. Putting things together, we obtain the desired condition, ji > 1. ■ 
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